1
Beyond Linear Arrays: Scaling to Multidimensional Data
AI032 Lesson 4
00:00

Welcome to The Great Handover. In CPU programming, we define how to iterate; in GPGPU, we define what an iteration looks like. This shift from instruction-centric to data-centric logic is powered by the Kernel Abstraction.

1. The __global__ Blueprint

By using the __global__ qualifier, you are not writing a function—you are designing a scalable blueprint. A single kernel execution represents one standalone unit of work, allowing the GPU to orchestrate thousands of identical tasks across its massive core count without manual thread management.

2. The Global Address Resolver

How does a single thread among millions find its target? It uses a deterministic contract known as the indexing formula:

$$\text{threadID} = \text{blockIdx.x} \times \text{blockDim.x} + \text{threadIdx.x}$$

This formula acts as a coordinate system, bridging the software's logical data (the array) to the hardware's physical hierarchy (blocks and threads).

Global Memory Array (10M Elements)Block 0Block 1Block N-1index = 1 * blockDim + threadIdx

3. Execution Configuration

The <<<B, T>>> parameters define the grid shape. This ensures Transparent Scalability: your code runs identical logic whether the hardware has 2 SMs or 80 SMs.

main.py
TERMINAL bash — 80x24
> Ready. Click "Run" to execute.
>